feat(schema): add resourceType dimension to events and gauges (column + index + projection)#17
feat(schema): add resourceType dimension to events and gauges (column + index + projection)#17lohanidamodar wants to merge 1 commit into
Conversation
resourceType is the API-level plural resource family the metric belongs
to ('functions', 'sites', 'buckets', 'databases', …) — distinct from
the existing singular `resource` tag (the row's own type: 'deployment',
'function', 'bucket'). Cloud's StatsResources and StatsUsage use this
plural form everywhere in metric names (METRIC_RESOURCE_TYPE_*); making
it a first-class column lets callers slice 'storage by resourceType'
without parsing metric names.
Library changes:
- Metric.php
- resourceType added to EVENT_COLUMNS and GAUGE_COLUMNS so it round-
trips through extractColumns() during write.
- String column (size 64) added to getEventSchema() and
getGaugeSchema() so new tables include it from the start.
- getEventIndexes() and getGaugeIndexes() include resourceType with a
`set(0)` data-skipping index (low cardinality — ~10-20 distinct
values across a project's lifetime, so set beats bloom_filter for
selective filters).
- getResourceType() accessor.
- ClickHouse.php
- EVENT_PROJECTIONS gains `p_by_resourceType` so grouped event
aggregations on resourceType route to a sum-projection instead of
scanning the base table.
- GAUGE_PROJECTIONS gains `p_by_resourceType` and the combined
`p_by_resourceType_resource` (latter covers the common
"storage breakdown by resourceType × resource" panel in one
projection scan).
- ensureGaugeDimColumns() extended with resourceType; new
ensureEventDimColumns() handles the same migration for existing
events tables. Called from setup() — projections that reference
resourceType cannot materialize until the source column exists on
both base tables.
This is a backwards-compatible additive change: existing deployments
get the new column via ALTER TABLE ADD COLUMN IF NOT EXISTS at next
setup(); existing rows have NULL until publishers start populating the
tag (cloud will in a follow-up).
|
Closing — |
Greptile SummaryThis PR adds
Confidence Score: 3/5The change needs fixes before merge because upgraded ClickHouse schemas and daily rollups do not fully support the new dimension, and the metric constant tests are stale. The review covers both changed files and identifies several concrete compatibility gaps in migration, projection, daily aggregation, schema alignment, and tests. src/Usage/Adapter/ClickHouse.php and tests/Usage/MetricTest.php
What T-Rex did
|
| $sql = "ALTER TABLE {$gaugesTable} " | ||
| . 'ADD COLUMN IF NOT EXISTS service LowCardinality(Nullable(String)), ' | ||
| . 'ADD COLUMN IF NOT EXISTS resource LowCardinality(Nullable(String))'; | ||
| . 'ADD COLUMN IF NOT EXISTS resource LowCardinality(Nullable(String)), ' | ||
| . 'ADD COLUMN IF NOT EXISTS resourceType LowCardinality(Nullable(String))'; | ||
|
|
||
| $this->query($sql); | ||
| } | ||
|
|
||
| /** | ||
| * Backfill late-added dim columns on an existing events table. Same | ||
| * reasoning as ensureGaugeDimColumns — CREATE TABLE IF NOT EXISTS won't | ||
| * pick up columns added to the schema after the table was first created, | ||
| * and a per-dim projection on `resourceType` cannot be materialized until | ||
| * the source column exists on the base table. | ||
| */ | ||
| private function ensureEventDimColumns(): void | ||
| { | ||
| $eventsTable = $this->escapeIdentifier($this->database) | ||
| . '.' . $this->escapeIdentifier($this->getEventsTableName()); | ||
|
|
||
| $sql = "ALTER TABLE {$eventsTable} " | ||
| . 'ADD COLUMN IF NOT EXISTS resourceType LowCardinality(Nullable(String))'; | ||
|
|
||
| $this->query($sql); |
There was a problem hiding this comment.
This migration only adds the new resourceType columns. On existing ClickHouse tables, createTable() is skipped by CREATE TABLE IF NOT EXISTS, so the new index-resourceType definitions from Metric::get*Indexes() are never added. After an upgrade, WHERE resourceType = ... queries can still scan without the promised skip index, while fresh installs get different behavior.
Artifacts
Repro: generated migration harness
- Contains supporting evidence from the run (text/x-php; charset=utf-8).
Repro: harness output showing missing resourceType index migration
- Keeps the command output available without making the summary code-heavy.
| $sql = "ALTER TABLE {$eventsTable} " | ||
| . 'ADD COLUMN IF NOT EXISTS resourceType LowCardinality(Nullable(String))'; |
There was a problem hiding this comment.
The events base table gets resourceType, but the daily events table, daily materialized view, DAILY_COLUMNS, and findDaily() grouping still omit it. Calls that use the daily APIs with a resourceType filter are rejected by validateDailyAttributeName(), and closed-day event sums filtered by resourceType cannot use the daily rollup because the router treats the column as not daily-safe. This leaves resourceType incomplete as a first-class events dimension for historical usage reads.
Summary
Adds
resourceTypeas a first-class dimension column on the events and gauges tables, with a data-skipping index and dedicated per-dim projections so that grouped/filtered reads on resourceType route to a projection instead of a base-table scan.What
resourceTypeis and whyresourceTypeis the API-level plural family the metric belongs to:'functions','sites','buckets','databases', etc. It is distinct from the existing singularresourcetag, which is the row's own type —'deployment','function','bucket','collection','file'.Cloud's
StatsResourcesandStatsUsagealready use the plural form everywhere in metric names (METRIC_RESOURCE_TYPE_DEPLOYMENTS,METRIC_RESOURCE_TYPE_BUILDS_STORAGE, etc.) — making it a first-class column lets callers slice "storage by resourceType" or "events by resourceType" without parsing metric names.Library changes
Metric.phpEVENT_COLUMNSandGAUGE_COLUMNSgain'resourceType'so it round-trips throughextractColumns()during write.getEventSchema()/getGaugeSchema()add astring(64)column. (64 is plenty — current values are 8-12 chars; the cap is documentation, not a real ceiling.)getEventIndexes()/getGaugeIndexes()indexresourceTypewithset(0)since cardinality is bounded (~10-20 distinct values across a project lifetime).setoutperformsbloom_filterfor low-cardinality columns with selectiveWHERE resourceType = 'X'.getResourceType()accessor.ClickHouse.phpEVENT_PROJECTIONSgainsp_by_resourceType— grouped event aggregations onresourceTypewill use a SUM projection.GAUGE_PROJECTIONSgains:p_by_resourceType— single-dim, for "latest snapshot per resourceType" viaargMax(value, time).p_by_resourceType_resource— covers the common "storage breakdown by resourceType × resource" panel in one projection scan.ensureGaugeDimColumns()extended withresourceType; newensureEventDimColumns()does the equivalent migration on existing events tables. Both invoked fromsetup()before projections are added — projections that referenceresourceTypecannot materialize until the source column exists on the base table.Performance
SELECT … FROM events WHERE resourceType = 'functions' GROUP BY metricp_by_resourceTypeSUM projectionSELECT … FROM gauges WHERE resourceType = 'sites' GROUP BY resourceargMaxp_by_resourceType_resourceprojectionSELECT resourceType, sum(value) FROM events GROUP BY resourceTypep_by_resourceTypeprojectionFor very selective
WHERE resourceType = 'X'queries withoutGROUP BY, theset(0)skip-index prunes parts at the granule level before the read.Compatibility
EVENT_COLUMNS/GAUGE_COLUMNSgain one entry;extractColumns()is strict and rejects unknown keys, so callers that already passresourceTypein tags begin populating the column immediately. Callers that don't are unchanged.setup()call viaALTER TABLE ADD COLUMN IF NOT EXISTS. Existing rows haveNULLuntil publishers backfill or new writes land.addProjection()helper — re-runningsetup()is safe.Test plan
composer checkclean (PHPStan max + baseline)composer lintclean (Pint)composer testagainst ClickHouse — confirmssetup()creates the column, new projections materialize, andaddBatch/findround-trip the field through the tag mapSELECT * FROM system.projection_parts WHERE table = 'gauges' AND name = 'p_by_resourceType'aftersetup()shows partsEXPLAIN PLAN indexes=1 SELECT … WHERE resourceType = 'X' GROUP BY metricshows the projection in the query planFollow-up
A companion PR on
appwrite-labs/cloudwill:'resourceType'toVALID_DIMENSIONSon/v1/usage/gaugesand/v1/usage/eventsStatsResources(per-bucket →resourceType='buckets', per-function →'functions', per-site →'sites', per-collection/database →'databases') and fromStatsUsage(viainferResourceTypeFromMetricmirroring the existinginferServiceFromMetric)resourceTypefield toUsageDataPointresponse modelThat lands once this library PR ships and is tagged.